Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing dropwizard emitter for druid #7363

Merged
merged 16 commits into from
Oct 1, 2019

Conversation

nishantmonu51
Copy link
Member

This is an update for previous PR - #4196
Corresponding issues are
#3927
#1996

This extension integrates Dropwizard metrics library with druid so that dropwizard users can easily absorb druid into their monitoring ecosystem. It accumulates druid metrics as dropwizard metrics, and emits them to various sinks via dropwizard supported reporters.

Currently supported dropwizard metrics types counter, gauge, meter, timer and histogram.
These metrics can be emitted using either Console or JMX reporter.

Support for other reporters can be easily added in future PRs.

Note - I have cherry-picked the commit from previous author to provide him the credit of his work.

@nishantmonu51
Copy link
Member Author

@jihoonson @b-slim: Can you please help with the review here ?
It will be a nice addition to 0.15 release.

## Introduction

This extension integrates [Dropwizard](http://metrics.dropwizard.io/3.1.0/getting-started/#) metrics library with druid so that dropwizard users can easily absorb druid into their monitoring ecosystem.
It accumulates druid metrics as dropwizard metrics, and emits them to various sinks via dropwizard supported reporters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is scary, how much it accumulates is this onheap or offheap? is there a config to bound this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the accumulation does not mean it queues things up, it stores the latest values in the metrics registry, the memory usage depends on the number of metrics and metric types.

Note: We are not handling all individual query metrics here, it's just the metrics defined in the extension config which are counters, gauges and timers. So have minimal impact on heap usage.e.g. for a counter it uses a single AtomicLong, for gauge it gives an instantaneous value and we are caching that in the metrics extensions impl.

https://metrics.dropwizard.io/4.0.0/getting-started.html has more details on the individual event types.

|--------|-----------|---------|-------|
|`druid.emitter.dropwizard.reporters`|List of dropwizard reporters to be used.|yes|none|
|`druid.emitter.dropwizard.prefix`|Optional prefix to be used for metrics name|no|none|
|`druid.emitter.dropwizard.includeHost`|Flag to include the hostname as part of the metric name.|no|yes|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this includes host and ports ? or only host?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host and port both, It is essentially ServiceMetricEvent.getHost() value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please make the doc more clear that it includes both host and port?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

|`druid.emitter.dropwizard.reporters`|List of dropwizard reporters to be used.|yes|none|
|`druid.emitter.dropwizard.prefix`|Optional prefix to be used for metrics name|no|none|
|`druid.emitter.dropwizard.includeHost`|Flag to include the hostname as part of the metric name.|no|yes|
|`druid.emitter.dropwizard.includeDimensionNames`|Flag to include the dimension names as part of the metric name.|no|yes|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am trying to understand here, why a used would like to get a metric without dimension name? am not sure am getting this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is mostly a copy of other stats-d emitter extension and left over of that, Agree, It is not much useful for dropwizard, will remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

|`druid.emitter.dropwizard.prefix`|Optional prefix to be used for metrics name|no|none|
|`druid.emitter.dropwizard.includeHost`|Flag to include the hostname as part of the metric name.|no|yes|
|`druid.emitter.dropwizard.includeDimensionNames`|Flag to include the dimension names as part of the metric name.|no|yes|
|`druid.emitter.dropwizard.dimensionMapPath`|Path to JSON file defining the StatsD type, and desired dimensions for every Druid metric|no|Default mapping provided. See below.|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is statsD a typo mistake or it is actually called statsD ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over of copy and paste, fixed.


Each metric emitted using Dropwizard must specify a type, one of `[timer, counter, guage, meter, histogram, ]`. Dropwizard Emitter expects this mapping to
be provided as a JSON file. Additionally, this mapping specifies which dimensions should be included for each metric.
If the user does not specify their own JSON file, a default mapping is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a link to the default mapping ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Each metric emitted using Dropwizard must specify a type, one of `[timer, counter, guage, meter, histogram, ]`. Dropwizard Emitter expects this mapping to
be provided as a JSON file. Additionally, this mapping specifies which dimensions should be included for each metric.
If the user does not specify their own JSON file, a default mapping is used.
All metrics are expected to be mapped. Metrics which are not mapped will log an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about ignored or warning ? seems like error is too much, especially in Druid we add metrics pretty often.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to ignored.

metricMap = readMap(mapper, dimensionMapPath);
}

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add docs on how and why this will return null ? plus i see this nullable annotation all over the Druid code, am not sure what is used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added javadoc. null is returned when there is no mapping for a metric.

String service,
String metric,
Map<String, Object> userDims,
Map<String, String> builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading the signature of this method i can not tell what is actually doing and why a map object is called a builder ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the builder to filteredDimensions and mentioned it in javadoc.

This is because some metrics are reported differently, but with the same name, from different services.
*/
DropwizardMetricSpec metricSpec = null;
if (metricMap.containsKey(metric)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do this in one get and a null check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}
return metricSpec;
} else {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the meaning of null ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means no mapping found for metric

{
}).readValue(is);
}
catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about closing the stream on exception ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why is is not closed on normal exit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought ObjectReader.readValue will close the stream in normal flow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObjectReader.readValue(..) doesn't mention closing the InputStream by itself, it might be safer to close it in finally block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.



@Override
public void start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we are not guarding this and check for start not set ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added guards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, when can it be called twice? It should be managed by the lifecycle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it should be called through lifecycle, It seems other emitter has guards too so i added a it here also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LifecycleLock utility is useful for this usecase

public class DropwizardConverter
{
private static final Logger log = new Logger(DropwizardConverter.class);
private Map<String, DropwizardMetricSpec> metricMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


public class DropwizardEmitter implements Emitter
{
private static Logger log = new Logger(DropwizardEmitter.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static final

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


@Override
public void close()
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for start, shouldn't we check for start is true ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added guard

public void flush()
{
for (DropwizardReporter reporter : reporters) {
reporter.flush();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make sure this is not blocking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is non-blocking, basically a no-op for the current two implementations of reporters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could mention that it must be non blocking in the javadoc of DropwizardReporter.flush(), so that if someone implements another then they know about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

final Injector injector
)
{
List<Emitter> alertEmitters = Lists.transform(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use native JDK functions/streams and move away from guava.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@Override
public void close()
{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing to close ? this reporter has not state to clean ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
~ Licensed to Metamarkets Group Inc. (Metamarkets) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this change is needed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -126,6 +126,9 @@
* <p>
* The RemoteTaskRunner uses ZK for job management and assignment and http for IPC messages.
*/


//KubernetesTaskRunner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@b-slim
Copy link
Contributor

b-slim commented Apr 25, 2019

@nishantmonu51 am very ignorant of this DropWizard stuff, can you please explain where the metrics are buffered ? and how often cleaned ? how we can make sure it does not overflow ? and how the JMX one works do we need some Jvm flags to be set to turn it on can you please document how we can test that ?

private final DropwizardEmitterConfig config;
// Note: the gauges do not represent the actual instantaneous value for the metrics.
// Instead they have the last known value for it.
private final ConcurrentHashMap<String, Number> gagues = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if let say we have a metric where the set of the key grows over time eg queryId or something similar, this map will grow without any bound and that can lead to big troubles, how about making the keys weak refs or having an actual limit on the size of this map ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking into it, we can add a limit to the map size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a limit to the size of map.

@stale
Copy link

stale bot commented Jun 25, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

@stale stale bot added the stale label Jun 25, 2019
@stale
Copy link

stale bot commented Jun 26, 2019

This pull request/issue is no longer marked as stale.

1 similar comment
@stale
Copy link

stale bot commented Jun 26, 2019

This pull request/issue is no longer marked as stale.

@nishantmonu51
Copy link
Member Author

@jihoonson @b-slim : Updated PR based on review comments, please check.

Used to report druid metrics via JMX.
```

druid.emitter.dropwizard.reporter={"type":"jmx"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
druid.emitter.dropwizard.reporter={"type":"jmx"}
druid.emitter.dropwizard.reporters=[{"type":"jmx"}]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don e.


```

druid.emitter.dropwizard.reporter={"type":"console","emitIntervalInSecs":30}"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
druid.emitter.dropwizard.reporter={"type":"console","emitIntervalInSecs":30}"}
druid.emitter.dropwizard.reporters=[{"type":"console","emitIntervalInSecs":30}"}]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

```

### Default Metrics Mapping
```json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is possible to point this to actual file in the source code because this has the potential to go out of sync

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a link too. Initially it was a link only but added the file as part of review comments.

nameBuilder.add(config.getPrefix());
}
nameBuilder.add("metric=" + metric);
nameBuilder.add("service=" + service);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use string format instead of concatenations here and in all other places as this code is gonna be called many many times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@JsonProperty("maxGaugeCount") Integer maxGaugeCount
)
{
this.reporters = reporters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null/empty check ? it would be useless to set it up with no reporters and probably a configuration error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

private final DropwizardEmitterConfig config;
// Note: the gauges do not represent the actual instantaneous value for the metrics.
// Instead they have the last known value for it.
private final Map<String, Number> gagues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Map<String, Number> gagues;
private final Map<String, Number> gauges;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

this.config = config;
this.reporters = config.getReporters();
this.converter = new DropwizardConverter(mapper, config.getDimensionMapPath());
this.gagues = Collections.synchronizedMap(new GaugesCache<>(config.getMaxGaugeCount()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use guava Cache instead of doing our own ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. replaced it use caffeine cache instead.

@himanshug
Copy link
Contributor

Copy link
Contributor

@himanshug himanshug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the build

@himanshug himanshug merged commit 8537fbe into apache:master Oct 1, 2019
@jon-wei jon-wei added this to the 0.17.0 milestone Dec 17, 2019
@jon-wei jon-wei mentioned this pull request Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants